Live freelance tracking. Raw descriptions turned into structured data. Find your next tech project without the noise.
upwork.com 🟢 2026-05-14
🔹 [Format] Convert PDF to structured Excel
👤 Client: 🇨🇦 Canada Member since 2021-01-06
💰 Price: ****
🚩 Problem: Automate the extraction of data from various PDF formats into clean, structured Excel files.
📦 Existing: Not specified
Specifications:
[Target] Extract data from readable and scanned (OCR) PDFs
[Method] Use Python with OCR libraries for text recognition and PDF parsing tools
[UI/UX] Not applicable - backend system only
[Stack] Python, Tesseract/PaddleOCR/Google Vision/AWS Textract, PyPDF2, openpyxl
[Security] Ensure data privacy during extraction and storage
[Format] Output structured Excel files
Workflow:
1. Define the schema for the target Excel format.
2. Implement OCR functionality to handle scanned PDFs.
3. Develop a PDF parsing module using PyPDF2 or similar libraries.
4. Integrate data extraction logic with OCR and PDF parsing.
5. Automate Excel file creation using openpyxl.
6. Test the system on various PDF formats and layouts.